Efficient enumeration-selection computational strategy for adaptive chemistry

Design problems of finding efficient patterns, adaptation of complex molecules to external environments, affinity of molecules to specific targets, dynamic adaptive behavior of chemical systems, reconstruction of 3D structures from diffraction data are examples of difficult to solve optimal design or inverse search problems. Nature inspires evolution strategies to solve design problems that are based on selection of successful adaptations and heritable traits over generations. To exploit this strategy in the creation of new materials, a concept of adaptive chemistry was proposed to provide a route for synthesis of self-adapting molecules that can fit to their environment. We propose a computational method of an efficient exhaustive search exploiting massive parallelization on modern GPUs, which finds a solution for an inverse problem by solving repetitively a direct problem in the mean field approximation. One example is the search for a composition of a copolymer that allows the polymer to translocate through a lipid membrane at a minimal time. Another example is a search of a copolymer sequence that maximizes the polymer load in the micelle defined by the radial core-shell potentials. The length and the composition of the sequence are adjusted to fit into the restricted environment. Hydrogen bonding is another pathway of adaptation to the environment through reversible links. A linear polymer that interacts with water through hydrogen bonds adjusts the position of hydrogen bonds along the chain as a function of the concentration field around monomers. In the last example, branching of the molecules is adjusted to external fields, providing molecules with annealed topology, that can be flexibly changed by changing external conditions. The method can be generalized and applied to a broad spectrum of design problems in chemistry and physics, where adaptive behavior in multi-parameter space in response to environmental conditions lead to non-trivial patterns or molecule architectures and compositions. It can further be combined with machine learning or other optimization techniques to explore more efficiently the parameter space.

1. First module is the controlling module. It controls core process such as the I/O data flow between host and device, optimization method and selection strategy to explore the polymer properties: specific max/min objectives, choose of the optimization method, such as genetic algorithms. This module specifies the geometry and discretization of space taking into account the symmetry of the system: lattice parameters, grid dimensions, symmetry of the system (cylindrical, spherical, planar or 3D) and the size of the box.
2. The second module defines force fields and topology of molecules, which is determined by the range of parameters: monomer composition, branching, annealing. In this realization, N * M dimensional array is allocated in the host memory then transferred to the global memory for the Hamiltonian calculation, where N is the total number of monomers in a single molecule (in case it is fixed and not a design parameter) and M is the number of topology parameters associated with the structure of the molecule. More parameters are related to the structure of the molecule and polymer topology: linear or branched polymer, annealed or fixed ration of hydrophobic and hydrophilic blocks, interaction strength and range. As a result, molecule structure is represented by (3 + N )-dimensional array stored in the constant memory, where N is the number of units or beads composing a polymer molecule (polymer length). For a cubic lattice, it contains the description of the geometry and division of the simulation box into set of auxiliary cells.
3. The third module is the sampling generator, which generates the molecule conformations and calculates the interactions according to the Eq 3 of the main text. The code for the generation of molecules according to a given statistical distribution can be changed independently from other parts of the program. In our realization, the chains are gener-2 ated according to GPU parallel version of the Rosenbluth algorithm, Ref. 29 of the main text, providing uncorrelated self-avoiding walks, which can run on each CUDA thread independently, namely, the full computational task of parallel random monomer selection and random self-avoiding walks can be programmed at once in a single GPU kernel. Thus, each polymer conformation is generated with each CUDA thread (Algorithm 1).
In this Algorithm the number of blocks N umBlock is defined as the total number of molecules to be generated divided by the number of threads per block. In this way each confirmation in each block is tagged by the thread number while generated. All the coordinates and weights associated with each conformation are stored in the shared memory.
The first monomer is placed at the center of the coordinate system or it can also be placed at any position in the simulation box depending on realization. The second bond vector is randomly chosen within all z = 26 possible lattice space in 3D space and added to the monomer. Starting from the third monomer, a bias is introduced to the position of a new monomer because the new monomer may overlap with the previous monomers. This is taken into consideration in the so-called Rosenbluth weight, which is the probability of positioning a new monomer avoiding overlaps with previous beads.
For the 3D geometry of cubic lattice, the Rosenbluth weight is cumulatively calculated after the whole chain is generated. During the chain generation, next monomer position is randomly chosen from the z = 26 neighbors of the present bead. This position is accepted if there is no overlap with all previously generated beads.
The random chain growth process will be continued if the chain is self-avoiding. However, if the chain grow to a dead end when all possible nearby cells are occupied by previous monomers, the whole generated sequence, obtained up to this point, must be discarded, and the process starts from the first step again. Repeating these steps a conformation of a self-avoiding polymer molecule of a given length N with a calculated Rosenbluth bias is the result of this simulation of each CUDA thread. Since the length N is the same for all statistical conformations, the threads will finish calculation at roughly the same time.
The interaction with constant fields is also calculated during the chain generation using Eq. 3. Once the chain generation is finished, the total partition function of the system is also obtained from the modules. With this, it gives the access to all the thermodynamic properties of the system. Once all conformations of the chain are generated, the terminal signal is received by Algorithm 1 Lattice Rosenbluth chain generation Require: BlockDim.x is the block dimension defined by user Require: seed is a random number seed chosen by the user Require: chainlength is the polymer chain length chosen by the user Require: Function rngonlattice generates a random position for the next monomer Require: Function distance calculates the distance between two monomers 1: x, y, z ← BlockDim.x * chainlength 2: w ← BlockDim. end for 35: end function the control module and the selection of the optimal sequence can be performed before the generation of the sampling of another sequence.

STATISTICAL ANALYSIS OF A LINEAR POLYMER SEQUENCE
The statistical analysis of the pattern of a linear chain of fixed length N = 12 and variable ratio of hydrophobic and hydrophilic monomers, H/T is shown in Table S1. We studied statistical distribution of the average number of blocks, calculated by counting all H and T blocks and averaging 10 sequences of each group. Another characteristics is the average blocks sizes of H and T, calculated as the total block size divided by the total number of blocks. Both average number of blocks and blocks sizes were calculated from separated two groups: sequences with 10 maximum and 10 minimum values of τ .
There is a significant difference in the number of blocks and the block sizes between the fastest and slowest patterns. The number of blocks of the fastest patterns is much larger than that of the slowest patterns and there is an opposite trend for the block size. Both the number of blocks and the block size change monotonously with H/T ratio. The fastest patterns have a balanced distribution of hydrophobic and hydrophilic blocks with the ratio H/T ∼ 50% and the average block size is close to minimal possible, i.e. 1 monomer.
Statistical analysis of the pattern of the sequences with a balanced ratio H/T=50% is shown in Table S2. Three fastest and three slowest patterns corresponding to a minimum and a maximum of the mean escape time τ of a linear polymer chain with a fixed H/T ratio are shown in Fig. 5 of the main text. The results demonstrate that the fastest patterns consist of large number of alternating hydrophobic and hydrophilic blocks, which are not regular with few blocks grouped together. The slowest blocks resemble di-block copolymers, which are also not regular.
The fastest patterns (Min group) exhibit a positive correlation between the number of hydrophobic monomers T and the average number of blocks N T . The T block size S T is relatively small, ranging between 1.06 and 1.30, with a tendency of increase with the total length of the polymer. The block size of H, N H is slightly bigger than N T and is ranging between 1.20 and 1.36. The slowest patterns (Max group) demonstrate the average number of blocks equal to 1.67 with an average of 3 total blocks. Thus, the translocation behavior is strongly affected by the block size. The block size tends to stay as small as possible to allow for fast translocation, and correspondingly, the number of blocks is relatively high, thus suggesting a design solution in form of an alternating co-polymer with smallest block size.